Why Crashes Happen Here - A Combination Model of Land Use and Road Network¶
Key Workds:¶
Crash risk, Land use, Road network, Machine learning
Research Background and Motivation:¶
With accelerating urbanization and motorization, traffic safety has become a common challenge for cities worldwide. While traditional studies have focused on driver behavior(1), vehicle characteristics(2), or local roadway geometry(3), a growing body of evidence highlights the foundational role of land use(4) patterns and road network(5) structure in shaping traffic conflicts and crash risk. At the same time, network topology determines how traffic is distributed, which links become structural bottlenecks or unavoidable corridors, and where crash hotspots tend to emerge.
This research topic is to define the affection of land use and road network to crashes.
(1)Ma, M., Yan, X., Huang, H., & Abdel-Aty, M. (2010). Safety of Public Transportation Occupational Drivers: Risk Perception, Attitudes, and Driving Behavior. Transportation Research Record: Journal of the Transportation Research Board, 2145(1), 72-79. https://doi.org/10.3141/2145-09 (Original work published 2010)
(2)Metzger, K. B., Sartin, E., Foss, R. D., Joyce, N., & Curry, A. E. (2020). Vehicle safety characteristics in vulnerable driver populations. Traffic Injury Prevention, 21(sup1), S54–S59. https://doi.org/10.1080/15389588.2020.1805445
(3)Reagan J A. Designing for safety by analyzing road geometric[J]. Public Roads, 1994, 63(2): 21-27.
(4)Pulugurtha S S, Duddu V R, Kotagiri Y. Traffic analysis zone level crash estimation models based on land use characteristics[J]. Accident Analysis & Prevention, 2013, 50: 678-687.
(5)Wang X, Wu X, Abdel-Aty M, et al. Investigation of road network features and safety performance[J]. Accident Analysis & Prevention, 2013, 56: 22-31.
Direction of Research - land use¶
The part of land use is below, it is combine the python and story map.
https://storymaps.arcgis.com/stories/6dfb2e4b4d314008acafb68f1cffd10c
This section mainly demonstrates how land use influences traffic crashes. It identifies the spatiotemporal patterns of crash occurrence and further examines how land-use functions and land-use transitions may affect the likelihood of crashes.
Direction of Research - road network¶
| CRN | ARRIVAL_TM | AUTOMOBILE_COUNT | BELTED_DEATH_COUNT | BELTED_SUSP_SERIOUS_INJ_COUNT | BICYCLE_COUNT | BICYCLE_DEATH_COUNT | BICYCLE_SUSP_SERIOUS_INJ_COUNT | BUS_COUNT | CHLDPAS_DEATH_COUNT | ... | WORK_ZONE_LOC | WORK_ZONE_TYPE | WZ_CLOSE_DETOUR | WZ_FLAGGER | WZ_LAW_OFFCR_IND | WZ_LN_CLOSURE | WZ_MOVING | WZ_OTHER | WZ_SHLDER_MDN | WZ_WORKERS_INJ_KILLED | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020029257 | 1957.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2020008631 | 942.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2020006834 | 700.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2020006451 | 825.0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2020008695 | 1115.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 45585 | 2025027435 | 510.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 45586 | 2025012535 | NaN | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 45587 | 2025043500 | 2031.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 45588 | 2025007815 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 45589 | 2025026159 | 238.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
45590 rows × 99 columns
This figure illustrates the integration of traffic crash data for Philadelphia from 2020 to 2024. Crash records from multiple CSV files were read and concatenated into a single dataset, resulting in a comprehensive database with approximately 46,000 observations and 99 variables. The dataset includes detailed information on crash timing, involved modes, and injury outcomes, providing a solid foundation for subsequent spatiotemporal and network-based analyses.
This step extracts the unified boundary of Philadelphia using planning district polygons. By dissolving individual planning districts into a single geometry, a consistent citywide boundary is obtained. This boundary is used to constrain both road network extraction and crash analysis, ensuring spatial consistency across all analytical steps.
(<Figure size 800x800 with 1 Axes>, <Axes: >)
Using the extracted city boundary, a drivable road network for Philadelphia was downloaded from OpenStreetMap via OSMnx. The resulting network, shown after projection, captures the spatial structure and connectivity of the city’s roadway system. This network serves as the base for linking crash occurrences to specific road segments.
| osmid | oneway | name | highway | reversed | length | lanes | maxspeed | service | geometry | ref | bridge | access | tunnel | width | junction | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| u | v | key | ||||||||||||||||
| 109726936 | 109726940 | 0 | 12108955 | True | Brunner Street | residential | False | 174.382 | NaN | NaN | NaN | LINESTRING (-75.15542 40.01863, -75.15717 40.0... | NaN | NaN | NaN | NaN | NaN | NaN |
| 109992543 | 0 | 43226669 | False | Germantown Avenue | primary | False | 55.074 | 2 | 25 mph | disused_tram | LINESTRING (-75.15542 40.01863, -75.15569 40.0... | NaN | NaN | NaN | NaN | NaN | NaN | |
| 109992535 | 0 | 43226669 | False | Germantown Avenue | primary | True | 7.178 | 2 | 25 mph | disused_tram | LINESTRING (-75.15542 40.01863, -75.15538 40.0... | NaN | NaN | NaN | NaN | NaN | NaN | |
| 109726940 | 109726950 | 0 | 302956448 | False | Wayne Avenue | residential | False | 11.448 | NaN | NaN | NaN | LINESTRING (-75.15717 40.01782, -75.15710 40.0... | NaN | NaN | NaN | NaN | NaN | NaN |
| 110047495 | 0 | 302956448 | False | Wayne Avenue | residential | True | 34.627 | NaN | NaN | NaN | LINESTRING (-75.15717 40.01782, -75.15738 40.0... | NaN | NaN | NaN | NaN | NaN | NaN |
In this step, the road network is converted into a GeoDataFrame containing only edge-level information. Each road segment includes attributes such as OpenStreetMap ID, road name, functional class (e.g., residential, primary), directionality, segment length, and number of lanes. These attributes enable analysis of how roadway characteristics relate to crash occurrence.
| CRN | ARRIVAL_TM | AUTOMOBILE_COUNT | BELTED_DEATH_COUNT | BELTED_SUSP_SERIOUS_INJ_COUNT | BICYCLE_COUNT | BICYCLE_DEATH_COUNT | BICYCLE_SUSP_SERIOUS_INJ_COUNT | BUS_COUNT | CHLDPAS_DEATH_COUNT | ... | WORK_ZONE_TYPE | WZ_CLOSE_DETOUR | WZ_FLAGGER | WZ_LAW_OFFCR_IND | WZ_LN_CLOSURE | WZ_MOVING | WZ_OTHER | WZ_SHLDER_MDN | WZ_WORKERS_INJ_KILLED | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020029257 | 1957.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.09453 39.99463) |
| 1 | 2020008631 | 942.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.00614 40.04026) |
| 2 | 2020006834 | 700.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.16127 40.02431) |
| 3 | 2020006451 | 825.0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-74.99057 40.11199) |
| 4 | 2020008695 | 1115.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.20008 40.00740) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 45585 | 2025027435 | 510.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.17294 39.92485) |
| 45586 | 2025012535 | NaN | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.11726 40.03163) |
| 45587 | 2025043500 | 2031.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.15036 39.99339) |
| 45588 | 2025007815 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.15567 40.02052) |
| 45589 | 2025026159 | 238.0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-75.11345 39.99644) |
45405 rows × 100 columns
This figure shows the conversion of crash records from tabular data into spatial point features using reported latitude and longitude coordinates. After removing records with missing location information, crash points are stored as a GeoDataFrame in the WGS84 coordinate system (EPSG:4326). This step enables spatial linkage between crash events and the road network.
(45331, 100)
(15500, 3)
This step assigns each crash event to its nearest road segment in the network. First, the unioned road geometry is used to define the effective city boundary, and crash points are filtered to retain only those located within this boundary. Both the road network and crash points are then projected to a planar coordinate system (EPSG:2272) to ensure accurate distance calculations. Using OSMnx’s nearest edge function, each crash point is matched to the closest road segment based on spatial proximity. The corresponding node pairs (u, v) are recorded for each crash, allowing crashes to be aggregated at the road-segment level. This results in an edge-based crash count dataset, which forms the basis for subsequent calculation of crash intensity and risk metrics.
| u | v | key | osmid | oneway | name | highway | reversed | length | lanes | maxspeed | service | geometry | ref | bridge | access | tunnel | width | junction | crash_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 109726936 | 109726940 | 0 | 12108955 | True | Brunner Street | residential | False | 174.382 | NaN | NaN | NaN | LINESTRING (-75.15542 40.01863, -75.15717 40.0... | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 |
| 1 | 109726936 | 109992543 | 0 | 43226669 | False | Germantown Avenue | primary | False | 55.074 | 2 | 25 mph | disused_tram | LINESTRING (-75.15542 40.01863, -75.15569 40.0... | NaN | NaN | NaN | NaN | NaN | NaN | 4.0 |
| 2 | 109726936 | 109992535 | 0 | 43226669 | False | Germantown Avenue | primary | True | 7.178 | 2 | 25 mph | disused_tram | LINESTRING (-75.15542 40.01863, -75.15538 40.0... | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 |
| 3 | 109726940 | 109726950 | 0 | 302956448 | False | Wayne Avenue | residential | False | 11.448 | NaN | NaN | NaN | LINESTRING (-75.15717 40.01782, -75.15710 40.0... | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 |
| 4 | 109726940 | 110047495 | 0 | 302956448 | False | Wayne Avenue | residential | True | 34.627 | NaN | NaN | NaN | LINESTRING (-75.15717 40.01782, -75.15738 40.0... | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 |
Road segment data are merged with crash counts using node pairs (u, v) as unique identifiers. Crash counts are assigned to each road segment, with segments experiencing no crashes set to zero. This produces a complete edge-level dataset that directly associates roadway characteristics with crash frequency.
This histogram presents the distribution of crash index values across road segments with nonzero crash occurrences. The distribution is approximately unimodal, with most values concentrated in the mid-range and a noticeable right-skewed tail. This pattern indicates that while most road segments experience moderate crash risk, a small subset exhibits disproportionately high risk, highlighting spatial inequality in crash exposure.